
<!DOCTYPE html>
<html>
  <head>
    
<meta charset="utf-8" >

<title>业界CTR深度学习框架的一些新的进展  | dragon</title>
<meta name="description" content="邮箱(base64)：MTY5MDMwMjk2M0BxcS5jb20=
">

<meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=1, user-scalable=no">
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/animate.css/3.7.0/animate.min.css">

<link rel="stylesheet" href="https://use.fontawesome.com/releases/v5.7.2/css/all.css" integrity="sha384-fnmOCqbTlWIlj8LyTjo7mOUStjsKC4pOpQbqyi7RrhN7udi9RwhKkMHpvLbHG9Sr" crossorigin="anonymous">
<link rel="shortcut icon" href="https://dragonfive.gitee.io//favicon.ico?v=1740893463017">
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.10.0/katex.min.css">
<link rel="stylesheet" href="https://dragonfive.gitee.io//styles/main.css">



<script src="https://cdn.jsdelivr.net/npm/vue/dist/vue.js"></script>
<script src="//cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.5.1/build/highlight.min.js"></script>



  </head>
  <body>
    <div id="app" class="main">
      <div class="site-header-container">
  <div class="site-header">
    <div class="left">
      <a href="https://dragonfive.gitee.io/">
        <img class="avatar" src="https://dragonfive.gitee.io//images/avatar.png?v=1740893463017" alt="" width="32px" height="32px">
      </a>
      <a href="https://dragonfive.gitee.io/">
        <h1 class="site-title">dragon</h1>
      </a>
    </div>
    <div class="right">
      <transition name="fade">
        <i class="icon" :class="{ 'icon-close-outline': menuVisible, 'icon-menu-outline': !menuVisible }" @click="menuVisible = !menuVisible"></i>
      </transition>
    </div>
  </div>
</div>

<transition name="fade">
  <div class="menu-container" style="display: none;" v-show="menuVisible">
    <div class="menu-list">
      
        
          <a href="/" class="menu purple-link">
            首页
          </a>
        
      
        
          <a href="/archives" class="menu purple-link">
            归档
          </a>
        
      
        
          <a href="/tags" class="menu purple-link">
            标签
          </a>
        
      
        
          <a href="/post/about" class="menu purple-link">
            关于
          </a>
        
      
    </div>
  </div>
</transition>


      <div class="content-container">
        <div class="post-detail">
          
          <h2 class="post-title">业界CTR深度学习框架的一些新的进展 </h2>
          <div class="post-info post-detail-info">
            <span><i class="icon-calendar-outline"></i> 2021-03-31</span>
            
              <span>
                <i class="icon-pricetags-outline"></i>
                
                  <a href="https://dragonfive.gitee.io/tag/WzibKNMac/">
                    深度学习
                    
                  </a>
                
              </span>
            
          </div>
          <div class="post-content" v-pre>
            <p>为了充分利用GPU的能力和高速带宽<br>
英伟达的 hugeCtr https://github.com/NVIDIA/HugeCTR 和 脸书 的 DLRM<br>
<a href="https://arxiv.org/abs/1906.00091">【CoRR2019】Deep Learning Recommendation Model for Personalization and Recommendation Systems</a><br>
把emb参数分成不同的份放在GPU HMB中，需要需要昂贵的GPU，不实用。</p>
<p>腾讯的DES<br>
<a href="https://arxiv.org/abs/1909.04823">Distributed Equivalent Substitution Training for Large-Scale Recommender Systems</a><br>
和 百度的 HierPs<br>
<a href="https://arxiv.org/abs/2003.05622">Distributed Hierarchical GPU Parameter Server for Massive Scale Deep Learning Ads System</a><br>
使用主存保存emb ，DES采用 field-aware 分片策略来reduce(减少or规约)GPU间的数据通信，但没有进行主存和GPU之间通信优化。<br>
HierPS使用大batch策略来在gpu中缓存使用大参数，以此减少传输延时。</p>
<p>Tensorflow, MxNet 和 PyTorch 并不能很好的支持大规模embedding的训练:</p>
<ul>
<li>PyTorch 中没有官方支持的ps</li>
<li>Mxnet 支持的模型大小因为实现问题受到了限制</li>
<li>tensorflow 使用它的ps后吞吐会严重下降</li>
</ul>
<p>为了提升tensorflow, mxnet, pytorch较差的分布式性能，uber 的horovod<br>
<a href="https://arxiv.org/abs/1802.05799">Horovod: fast and easy distributed deeplearninginTensorFlow</a><br>
和字节的byteps<br>
<a href="https://arxiv.org/abs/1807.00311">Product-based Neural Networks for User Response Prediction over Multi-field Categorical Data</a><br>
都支持不同的平台:</p>
<ul>
<li>horovod 使用Ring-AllReduce实现来加速dense模型的训练</li>
<li>byteps 通过调度优先来在不同的层加速同步参数，优化不同层的顺序来在反向传播和前向计算的时候同步参数</li>
</ul>
<p>之前写过一篇关于 horovod 的知识总结：<a href="https://dragonfive.github.io/post/uber-de-horovod/">uber的Horovod | dragon</a></p>

          </div>
        </div>

        
          <div class="next-post">
            <a class="purple-link" href="https://dragonfive.gitee.io/post/byteps-lun-wen-fan-yi-yu-shang-xi/">
              <h3 class="post-title">
                下一篇：byteps 论文翻译与赏析
              </h3>
            </a>
          </div>
          
      </div>

      

      <div class="site-footer">
  <div class="slogan">邮箱(base64)：MTY5MDMwMjk2M0BxcS5jb20=
</div>
  <div class="social-container">
    
      
        <a href="https://github.com/DragonFive" target="_blank">
          <i class="fab fa-github"></i>
        </a>
      
    
      
    
      
    
      
    
      
    
  </div>
  Powered by <a href="https://github.com/getgridea/gridea" target="_blank">Gridea</a> | <a class="rss" href="https://dragonfive.gitee.io//atom.xml" target="_blank">RSS</a>
</div>


    </div>
    <script type="application/javascript">

hljs.initHighlightingOnLoad()

var app = new Vue({
  el: '#app',
  data: {
    menuVisible: false,
  },
})

</script>




  </body>
</html>
